Heavy-tailed statistics in short-message communication 
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Short-message (SM) is one of the most frequently used communication channels in the modern 
society. In this Brief Report, based on the SM communication records provided by some volunteers, 
we investigate the statistics of SM communication pattern, including the interevent time distribu- 
tions between two consecutive short messages and two conversations, and the distribution of message 
number contained by a complete conversation. In the individual level, the current empirical data 
raises a strong evidence that the human activity pattern, exhibiting a heavy-tailed interevent time 
distribution, is driven by a non-Poisson nature. 
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I. INTRODUCTION 



Information communication builds the basis of social 
relations. In the modern science, statistical analysis on 
the communication database becomes one of the most im- 
portant approaches to reveal the social structure [1]. For 
example, the communication structures based on E-mail 
Q and phone-call Q display scale-free and small-world 
properties, which are ubiquitous in various social net- 
works. For the lack of long-term standard database about 
human communication activities, prior studies on social 
communication systems usually simply assume the tem- 
poral occurrence of contacts between two people is uni- 
form. That is to say, given two nodes in a social acquain- 
tance network, at any time the occurring probability of 
a new contact (e.g. telephone call, SM communication, 
on-line instant chat, e-mail communication, etc.) is the 
same, which leads to a Poisson distribution of interevent 
time between two consecutive contacts. However, re- 
cently, the empirical investigations on e-mail [4( and sur- 
face mail [H communication show a far different scenario: 
those communication patterns follow non-Poisson statis- 
tics, characterized by bursts of rapidly occurring events 
separated by long gaps. That is, the interevent time 
distribution has a much fatter tail than the exponential 
form, approximated to a power law. The similar statisti- 
cal properties have also been found in many other human 
behaviors Q , including market transaction @, H[ , on-line 
game playing @ , movie watching [To[ , web browsing [ll| , 
and so on. Those empirical statistics clearly indicate the 
invalidity of Poisson process in mimicking the human dy- 
namics in many real-life systems. Motivated by those 
empirical evidences, scientists are desired to uncover the 
origin of heavy-tails in human dynamics, as well as to 
reveal the effect of non-Poisson statistics on some dy- 
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namical process in social systems. In the former aspect, 
Barabasi et al suggested the highest-priority-first (HPF) 
protocol a potential origin 0, Q~2[, however, this queuing 
model may not well explain all the possible mechanisms 
leading to a heavy tail [13[. In the latter aspect, so far, 
only few works about epidemic spreading are reported 

0. 

Based on the rapid progress on wireless techniques, the 
SM communication becomes one of the most important 
social contact tools in the modern society. In this Brief 
Report, based on the SM communication records pro- 
vided by some volunteers, we investigate the statistics 
of SM communication pattern, including the interevent 
time distributions between two consecutive short mes- 
sages and two conversations, and the distribution of mes- 
sage number contained by a complete conversation. In 
the individual level, the current empirical data raises a 
strong evidence that the human activity pattern, exhibit- 
ing a heavy-tailed interevent time distribution, is driven 
by a non-Poisson nature. 



II. DATA 



In this Brief Report, the SM records of eight volunteers 
(signed from A to H) are investigated. Those contain 
one company manager (C) and seven University students 
(else). The overall time spans of those records range from 
three to six months. Every record contains the sending 
time of SMs, and the records A, C, E, F contain the cell 
phone numbers of the receivers. Some basic properties of 
the records are listed in the Table 1. 

We focus on the common properties shared by differ- 
ent records, for those may imply some general statistical 
characteristics of the human temporal activities. The 
time is coarse-grained in an hour resolution. 
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FIG. 1: (color online) The log- log plots of interevent time 
distributions. The black dots represent the empirical data, 
and the red lines are the linear fittings. The panels (a)-(h) 
correspond to the records of A-H, respectively. 



TABLE I: Properties of the records provided by the volun- 
teers. 



Records 


Time span 
(month) 


Total number 
of SMs 


Average number of 
SMs sent per day 


A 


6 


1528 


8.7 


B 


6 


4844 


26.8 


C 


6 


5987 


33.6 


D 


3 


3780 


41.1 


E 


6 


4523 (2263 a ) 


25.4 


F 


4.5 


5778 


41.6 


G 


5 


5346 


34.9 


H 


5.5 


3734 


22.1 



a Number of SMs recording the cell phone numbers of receivers. 



III. EMPIRICAL RESULTS 

In this section, we present the empirical results of (i) 
the interevent time distribution of sending SMs, (ii) the 
distribution of interevent time between two consecutive 
conversations, and (iii) the length distribution of conver- 
sations. The results indicate that the communication be- 
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FIG. 2: Dependence between the exponent a of each distri- 
bution and its daily average number, rid- 



haviors of different users share some common properties, 
especially the observed heavy tails in all those distribu- 
tions. 

As shown in Fig. 1, the interevent distribution can be 
well fitted by a power-law form P(r) oc r _a , where the 
exponent a is between 1.5 and 2.1 for different users. In 
each curve, an obvious peak turns up at about ten hours, 
which is relative to the physiological period of human: 
the sleeping time are more or less ten hours. Besides, as 
shown in Fig. 2, there are apparently positive correlation 
between the average numbers of SMs sent per day and 
the exponent <x This phenomenon, similar to the ob- 
servations in the on-line movie watching [lOj], is against 
the hypothesis [12[ on the discrete universality classes of 
human dynamics. 

Generally speaking, in the SM communication, a user 
often consecutively sends/receives messages to/from one 
person. According to our daily experience, when people 
are engaged in the SM communication, they often need 
several times of exchanging messages to build a complete 
conversation. Therefore, we define a conversation as SMs 
that are sent to one person consecutively, without being 
interrupted by some other persons. Furthermore, the in- 
terevent time between two consecutive conversations is 
defined as the time difference between the two begin- 
ning times. Only the samples A, C, E and F are inves- 
tigated since their records contain the phone numbers of 
receivers. As shown in Fig. 3, the interevent time dis- 
tributions of conversations can be well fitted by power 
laws, and the values of exponents are less than the cor- 
responding distributions of SM sending. Actually, the 
conversation rather than a single SM can better charac- 
terize the communication pattern because a conversation 
is functional complete. Yet it should be reminded that, 
in the real communication process, one complete conver- 
sation may be interrupted by other correspondents, so 
that a single conversation may be regarded as several 
small ones. Since these conditions cannot be automat- 
ically discriminated from the empirical data, they may 
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FIG. 3: (color online) The interevent time distributions of 
conversations in log- log plots. The black dots represent em- 
pirical data, and the red lines are the linear fittings. The 
panels (a), (b), (c) and (d) correspond to the records of A, C, 
E and F, respectively. 
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FIG. 4: (color online) The log-log plots of length distributions 
of conversations. The black dots represent empirical data, and 
the red lines are the linear fittings. The panels (a), (b), (c) and 
(d) correspond to the records of A, C, E and F, respectively. 



cause some bias to a certain extent. However, the sta- 
tistical characteristics such as heavy tails shown in Fig. 
3 are believable, because the range of the distribution 
will be enlarged and the tail of the distribution would be 
even fatter if the bias was eliminated. To characterize the 
strength of a conversation, we define a length of a con- 
versation as the number of SMs it contains. As shown in 
Fig. 4, every length distribution is dominated by a power 
law with exponent larger than 2. 



IV. CONCLUSION AND DISCUSSION 

In this Brief Report, we have investigated some statis- 
tical properties of SM communication in the individual 
level. The empirical evidence indicate that the SM com- 
munication pattern is governed by a non-Poisson statis- 
tics. For commercial reason and the right of personal 
privacy, we could not freely and automatically download 
the SM data without permission. Therefore, those results 
are limited by the lack of data. In despite of the small 
number of samples, all the records display very similar 
statistics, thus we believe those findings shown in this 
Brief Report are common for the most of SM users. 

The temporal statistics in SM communication are sim- 
ilar to those observed in e-mail [1] and surface mail [Bj] 
communications. Besides their similarity, the SM com- 
munication has some specific features different from e- 
mail and surface mail. Firstly, it is directly perceived 
through the senses to treat surface mails as some tasks 
waiting for reply. We may soon reply the urgent and im- 
portant letters, and the ones not important or difficult to 
reply may wait for a long time before being replied. In 
contrast, we usually reply a short message immediately. 
It is seldom seen that a short message is very hard to 
reply, thus we have to spend a long time in preparing the 
response. Therefore, instead of the HPF protocol the 
heavy-tailed distribution may root in some other mech- 
anisms, such as the competition with other tasks [l2j], 
the personal interest [13], the social interactions among 
many users [Hj , and so on. Secondly, a single short mes- 
sage is usually a tiny part of a complete conversation, 
thus to study the statistics in the resolution of conver- 
sation may be more proper to reflect the function of SM 
communication . 

Based on the analytical solution [16[ of the Barabasi 
model [4] , Vazquez et al [12[ claimed the existence of two 
discrete universality classes of human dynamics, whose 
characteristic power-law exponents are 1 and 1.5, respec- 
tively. The e-mail communication, web browsing and li- 
brary loans belong to the former, while the surface mail 
communication belongs to the latter. However, thus far, 
there are increasing empirical evidence against the hy- 
pothesis of universality classes for human dynamics Q. 
As shown in Fig. 1, different individual has different 
power-law exponents which are, typically, larger than 1.5. 
Furthermore, Fig. 2 show a positive correlation between 
activity and power-law exponent, which is against the 
discrete universality classes. However, we also note that 
the power-law exponents in the interevent distributions 
of conversations are closer to 1.5. A clearer picture asks 
for abundant data in the future. 
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